Self-correcting Q-learning

نویسندگان

چکیده

The Q-learning algorithm is known to be affected by the maximization bias, i.e. systematic overestimation of action values, an important issue that has recently received renewed attention. Double been proposed as efficient mitigate this bias. However, comes at price underestimation in addition increased memory requirements and a slower convergence. In paper, we introduce new way address bias form "self-correcting algorithm" for approximating maximum expected value. Our method balances single estimator used conventional double Q-learning. Applying strategy results Self-correcting We show theoretically enjoys same convergence guarantees while being more accurate. Empirically, it performs better than domains with rewards high variance, even attains faster zero or low variance. These advantages transfer Deep Q Network implementation call DQN which outperforms regular on several tasks Atari 2600 domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theory of self-learning Q-matrix

Cognitive assessment is a growing area in psychological and educational measurement, where tests are given to assess mastery/deficiency of attributes or skills. A key issue is the correct identification of attributes associated with items in a test. In this paper, we set up a mathematical framework under which theoretical properties may be discussed. We establish sufficient conditions to ensure...

متن کامل

Correcting the “self-correcting” mythos of science

In standard characterizations, science is self-correcting. Scientists examine each other’s work skeptically, try to replicate important discoveries, and thereby expose latent errors. Thus, while science is tentative, it also seems to have a system for correcting whatever mistakes arise. It powerfully explains and justifies the authority of science. Self-correction thus often serves emblematical...

متن کامل

Self-Correcting Models for Model-Based Reinforcement Learning

When an agent cannot represent a perfectly accurate model of its environment’s dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to “correct” its...

متن کامل

A Self-Correcting Projector

We describe a calibration and rendering technique for a projector that can render rectangular images under keystoned position. The projector utilizes a rigidly attached camera to form a stereo pair. We describe a very easy to use technique for calibration of the projector-camera pair using only black planar surfaces. We present an efficient rendering method to pre-warp images so that they appea...

متن کامل

Self-correcting quantum computers

Is the notion of a quantum computer (QC) resilient to thermal noise unphysical? We address this question from a constructive perspective and show that local quantum Hamiltonian models provide self-correcting QCs. To this end, we first give a sufficient condition on the connectedness of excitations for a stabilizer code model to be a self-correcting quantum memory. We then study the two main exa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i12.17334